HEAD ======= <<<<<<< HEAD ======= <<<<<<< HEAD >>>>>>> 759804787aa318daf2c8131d2fc7dc781ad1c015 ======= >>>>>>> e4585b6277d751d1cb48f325ee19042bf80319a3
The five leading causes of death in the United States from 1999 to 2014 are cancer, heart disease, unintentional injury, chronic lower respiratory disease, and stroke. The dataset includes the U.S. Department of Health and Human Services public health regions. Therefore, we can investigate the leading causes of death of each region, and develop accordingly public health policy and remedies.
From the dataset, we obtain the information that the number of potentially excess deaths from the five leading causes in rural areas was higher than those in urban areas.
We then analyzed several factors that might influence the rural-urban difference in potentially excess deaths from the five leading causes, many of which are associated with sociodemographic and ecological differences between rural and urban areas.
Through statistical analysis, our report provides an interactive and straightforward view on the potentially excess deaths from the five leading causes of death in non-metropolitan and metropolitan areas.
The ultimate goal is to bring attention to preventing deaths in the rural areas through improving healthcare services and public health programs.
cod_data = read_csv("./data/NCHS_-_Potentially_Excess_Deaths_from_the_Five_Leading_Causes_of_Death.csv") %>%
clean_names() %>%
na.omit() %>%
filter(!(state == "United States")) %>%
separate(., percent_potentially_excess_deaths, into = c("percent_excess_death"), sep = "%") %>%
mutate(percent_excess_death = as.numeric(percent_excess_death), mortality = observed_deaths/population * 10000, mortality = as.numeric(mortality)) %>%
select(year, age_range, cause_of_death, state, locality, observed_deaths, population, expected_deaths, potentially_excess_deaths, percent_excess_death, mortality, hhs_region)
## Warning: Too many values at 191748 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9,
## 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...
##columns removed
#"state_fips_code" "benchmark" "potentially_excess_deaths" "percent_excess_death" "mortality"
year variable contains data collected from 2005-2015.United States in the state variable.mortality which is calculated by observed_deaths/population * 10000. This variable indicates the number of deathes observed in every 10000 people in the three geographic regions: Metropolitan, Nonmetropolitan and All.region_cod_data = cod_data %>%
select(state, locality, hhs_region, percent_excess_death) %>%
group_by(state,locality, hhs_region) %>%
summarise(mean_ped = mean(percent_excess_death)) %>%
dplyr::filter(!(state == "District of\nColumbia")) %>%
mutate(hhs_region = as.character(hhs_region))
region_lm = lm(region_cod_data$mean_ped~region_cod_data$hhs_region)
rl1 = broom::tidy(region_lm)
kable(rl1)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 22.897478 | 2.124541 | 10.777612 | 0.0000000 |
| region_cod_data$hhs_region10 | 6.919893 | 3.302733 | 2.095202 | 0.0379936 |
| region_cod_data$hhs_region2 | -1.678888 | 4.456475 | -0.376730 | 0.7069571 |
| region_cod_data$hhs_region3 | 15.028044 | 3.161418 | 4.753577 | 0.0000050 |
| region_cod_data$hhs_region4 | 27.459220 | 2.776844 | 9.888645 | 0.0000000 |
| region_cod_data$hhs_region5 | 11.065534 | 2.962531 | 3.735162 | 0.0002743 |
| region_cod_data$hhs_region6 | 23.938689 | 3.103091 | 7.714466 | 0.0000000 |
| region_cod_data$hhs_region7 | 12.500192 | 3.302733 | 3.784802 | 0.0002292 |
| region_cod_data$hhs_region8 | 5.138616 | 2.962531 | 1.734536 | 0.0850722 |
| region_cod_data$hhs_region9 | 8.502314 | 3.302733 | 2.574327 | 0.0111054 |
region_locality_lm =lm(region_cod_data$mean_ped ~ region_cod_data$hhs_region+region_cod_data$locality)
rl2 = broom::tidy(region_locality_lm)
kable(rl2)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 21.518530 | 2.080719 | 10.3418699 | 0.0000000 |
| region_cod_data$hhs_region10 | 6.592074 | 2.933587 | 2.2471039 | 0.0262563 |
| region_cod_data$hhs_region2 | -0.892122 | 3.959828 | -0.2252931 | 0.8220919 |
| region_cod_data$hhs_region3 | 15.098291 | 2.807614 | 5.3776243 | 0.0000003 |
| region_cod_data$hhs_region4 | 27.131401 | 2.466649 | 10.9992936 | 0.0000000 |
| region_cod_data$hhs_region5 | 10.737714 | 2.631517 | 4.0804272 | 0.0000765 |
| region_cod_data$hhs_region6 | 23.610869 | 2.756320 | 8.5660831 | 0.0000000 |
| region_cod_data$hhs_region7 | 12.172373 | 2.933587 | 4.1493141 | 0.0000587 |
| region_cod_data$hhs_region8 | 4.810797 | 2.631517 | 1.8281458 | 0.0697354 |
| region_cod_data$hhs_region9 | 8.174495 | 2.933587 | 2.7865190 | 0.0060956 |
| region_cod_data$localityMetropolitan | -2.159388 | 1.555863 | -1.3879042 | 0.1674526 |
| region_cod_data$localityNonmetropolitan | 7.279690 | 1.582741 | 4.5994187 | 0.0000096 |
Interpretation: U.S. Department of Health and Human Services public health regions (1 through 10) are used as a categorical variable in the above regressions. Specific region classification is shown in the figure below.
<<<<<<< HEAD
As shown in the above summary tables, with respect to region 1, only region 2 have negative estimated coefficient, indicating less mean percentage of excess death . Specifically, comparing with region 1, region 2 have 1.67% less mean percentage of excess death on average. From region 3 to region 10, the mean percentage of excess death is higher comparing with region 1. Similiar results yield from regression adjusted for Locality. Adjusting locality, region 4 and 6 have top two highest increase in mean percetage of excess death with respect to region 1. Adjusting for different regions, similar result yield that, on average, metropolitan have 2% less than overall mean percetage of excess deaths and nonmetropolitan have 7% more than overall mean percetage of excess death.
[attach image here].
As shown in the above summary tables, with respect to region 1, only region 2 have negative estimated coefficient, indicating less mean percentage of excess death . Specifically, comparing with region 1, region 2 have 1.67% less mean percentage of excess death on average. From region 3 to region 10, the mean percentage of excess death is higher comparing with region 1. Similiar results yield from regression adjusted for Locality. Adjusting locality, region 4 and 6 have top two highest increase in mean percetage of excess death with respect to region 1. Adjusting for different regions, similar result yield that, on average, metropolitan have 2% less than overall mean percetage of excess deaths and nonmetropolitan have 7% more than overall mean percetage of excess death.
<<<<<<< HEAD ======= =======
As shown in the above summary tables, with respect to region 1, only region 2 have negative estimated coefficient, indicating less mean percentage of excess death . Specifically, comparing with region 1, region 2 have 1.67% less mean percentage of excess death on average. From region 3 to region 10, the mean percentage of excess death is higher comparing with region 1. Similiar results yield from regression adjusted for Locality. Adjusting locality, region 4 and 6 have top two highest increase in mean percetage of excess death with respect to region 1. Adjusting for different regions, similar result yield that, on average, metropolitan have 2% less than overall mean percetage of excess deaths and nonmetropolitan have 7% more than overall mean percetage of excess death.
cod_data %>%
mutate(year = as.factor(year))
## # A tibble: 191,748 x 12
## year age_range cause_of_death state locality observed_deaths
## <fctr> <chr> <chr> <chr> <chr> <int>
## 1 2005 0-49 Cancer Alabama All 756
## 2 2005 0-49 Cancer Alabama Metropolitan 556
## 3 2005 0-49 Cancer Alabama Nonmetropolitan 200
## 4 2005 0-49 Cancer Alabama All 756
## 5 2005 0-49 Cancer Alabama Metropolitan 556
## 6 2005 0-49 Cancer Alabama Nonmetropolitan 200
## 7 2005 0-49 Cancer Alabama All 756
## 8 2005 0-49 Cancer Alabama Metropolitan 556
## 9 2005 0-49 Cancer Alabama Nonmetropolitan 200
## 10 2005 0-54 Cancer Alabama All 1346
## # ... with 191,738 more rows, and 6 more variables: population <int>,
## # expected_deaths <int>, potentially_excess_deaths <int>,
## # percent_excess_death <dbl>, mortality <dbl>, hhs_region <int>
p <- cod_data %>%
plot_ly(
x = ~expected_deaths,
y = ~observed_deaths,
size = ~population,
color = ~cause_of_death,
frame = ~hhs_region,
text = ~state,
hoverinfo = "text",
type = 'scatter',
mode = 'markers'
) %>%
layout(title = "Change of Standardized Mortality Ratio in National Public Health Regions")
p
<<<<<<< HEAD
cod_data %>%
group_by(cause_of_death) %>%
ggplot(aes(x = locality, y = mortality, fill = cause_of_death)) +
geom_col() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title="Locality vs. Mortality") +
labs(x="locality", y="mortality")

This bar graph shows the distribution of cause of death within mortality in the three geographic regions: Metropolitan, Nonmetropolitan and All. We observe that the weight of each composition of cuases of death is the same regardless of the locality. Nonmetropolitan area seems to have the highest mortaility, the number of deathes observed in every 10000 people.
cod_data %>%
group_by(cause_of_death) %>%
ggplot(aes(x = locality, y = mortality)) + geom_boxplot(aes(color = cause_of_death), na.rm = T) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title="Locality vs. Mortality") +
labs(x="locality", y="mortality")

We observe that the median of mortaility is the highest among all 5 causes of death, followed by heart disease, unintentional injury, heart disease, chronic lowee respiratory disease and stroke. This pattern remains consistant when we subdivide locaility into metropolitan region and nonmetrpolitan region, which is identical to our previous finding. All 5 cuases of death are right skewed regardless of the locality.
mortality_lm = lm(mortality ~locality, data = cod_data)
summary(mortality_lm)
##
## Call:
## lm(formula = mortality ~ locality, data = cod_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1454 -2.9547 -1.2983 0.9988 19.8959
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.10331 0.01672 245.474 <2e-16 ***
## localityMetropolitan -0.22576 0.02372 -9.519 <2e-16 ***
## localityNonmetropolitan 1.14451 0.02438 46.942 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.317 on 191745 degrees of freedom
## Multiple R-squared: 0.01821, Adjusted R-squared: 0.0182
<<<<<<< HEAD
=======
<<<<<<< HEAD
## F-statistic: 1778 on 2 and 191745 DF, p-value: < 2.2e-16
interpretation on linear regression: Here we obtain the linear regression model of mortality = 4.103 - 0.226localityMetropolitan -1.145localityNonmetropolitan. We expect to see 0.22576 less deaths for every 10000 people in the metropolitan region as compared to all regions. We expect to see 1.14451 more death for every 10000 people in the nonmetropolitan region as compared to all regions.
======= <<<<<<< HEAD >>>>>>> e4585b6277d751d1cb48f325ee19042bf80319a3 ## F-statistic: 1778 on 2 and 191745 DF, p-value: < 2.2e-16 rl3 = broom::tidy(mortality_lm) kable(rl3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 4.1033134 | 0.0167159 | 245.474203 | 0 |
| localityMetropolitan | -0.2257558 | 0.0237156 | -9.519299 | 0 |
| localityNonmetropolitan | 1.1445051 | 0.0243812 | 46.942205 | 0 |
| **interpretation on linear | regression:* | * | ||
| Here we obtain the linear | regression mo | del of morta | lity = 4.103 | - 0.226localityMetropolitan -1.145localityNonmetropolitan. |
| We expect to see 0.22576 l | ess deaths fo | r every 1000 | 0 people in t | he metropolitan region as compared to all regions. |
| We expect to see 1.14451 m | ore death for | every 10000 | people in th | e nonmetropolitan region as compared to all regions. |
interpretation on linear regression: Here we obtain the linear regression model of mortality = 4.103 - 0.226localityMetropolitan -1.145localityNonmetropolitan. We expect to see 0.22576 less deaths for every 10000 people in the metropolitan region as compared to all regions. We expect to see 1.14451 more death for every 10000 people in the nonmetropolitan region as compared to all regions.
>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5 >>>>>>> 759804787aa318daf2c8131d2fc7dc781ad1c015 >>>>>>> e4585b6277d751d1cb48f325ee19042bf80319a3Does mean percent excess death have significant difference between metro and non-metro groups?
gp_cod_data = cod_data %>%
group_by(year, locality) %>%
summarise(mean_ped = mean(percent_excess_death))
renderPlotly({
gp_cod_data_Metro = gp_cod_data %>%
group_by(year) %>%
filter(locality == "Metropolitan")
gp_cod_data_nonMetro = gp_cod_data %>%
group_by(year) %>%
filter(locality == "Nonmetropolitan")
gp_cod = cbind(gp_cod_data_Metro, gp_cod_data_nonMetro)
p = gp_cod %>%
plot_ly( x = ~year, y = ~mean_ped, name = 'Metropolitan', type = 'scatter', mode = 'markers') %>%
add_trace( y = ~mean_ped1, name = 'Nonmetropolitan', mode = 'markers') %>%
layout(yaxis = list(title = 'mean percent excess death (Metropolitan/Nonmetropolitan)'))
p
})
## function (...)
## {
## if (length(outputArgs) != 0 && !hasExecuted$get()) {
## warning("Unused argument: outputArgs. The argument outputArgs is only ",
## "meant to be used when embedding snippets of Shiny code in an ",
## "R Markdown code chunk (using runtime: shiny). When running a ",
## "full Shiny app, please set the output arguments directly in ",
## "the corresponding output function of your UI code.")
## hasExecuted$set(TRUE)
## }
## if (is.null(formals(origRenderFunc)))
## origRenderFunc()
## else origRenderFunc(...)
## }
<<<<<<< HEAD
## <environment: 0x7fd830af1388>
=======
<<<<<<< HEAD
## <environment: 0x7ff67d6c6788>
=======
<<<<<<< HEAD
## <environment: 0x7faa0ae5aae0>
=======
## <environment: 0x7ffdcf4876d0>
>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5
>>>>>>> 759804787aa318daf2c8131d2fc7dc781ad1c015
>>>>>>> e4585b6277d751d1cb48f325ee19042bf80319a3
## attr(,"class")
## [1] "shiny.render.function" "function"
## attr(,"outputFunc")
## function (outputId, width = "100%", height = "400px", inline = FALSE)
## {
## htmlwidgets::shinyWidgetOutput(outputId = outputId, name = "plotly",
## width = width, height = height, inline = inline, package = "plotly")
## }
## <environment: namespace:plotly>
## attr(,"outputArgs")
## list()
## attr(,"hasExecuted")
## <Mutable>
## Public:
## clone: function (deep = FALSE)
## get: function ()
## set: function (value)
## Private:
## value: FALSE
This plot shows the gap of mean percent excess death for metropolitan and nonmetropolitan. Nonmetropolitan has a obviously larger mean percent excess death than metropolitan areas. In order to see if the difference is significant, we will run a regression analysis regarding mean excess death and locality.
gp_locality_lm = lm(mean_ped ~locality , data = gp_cod_data)
summary(gp_locality_lm)
##
## Call:
## lm(formula = mean_ped ~ locality, data = gp_cod_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5667 -1.0325 -0.3470 0.7922 3.2937
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.8866 0.4071 83.238 < 2e-16 ***
## localityMetropolitan -2.0599 0.5757 -3.578 0.0012 **
## localityNonmetropolitan 8.0460 0.5757 13.975 1.13e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.35 on 30 degrees of freedom
## Multiple R-squared: 0.9198, Adjusted R-squared: 0.9145
## F-statistic: 172.1 on 2 and 30 DF, p-value: < 2.2e-16
rl4 = broom::tidy(gp_locality_lm)
kable(rl4)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 33.886633 | 0.4071070 | 83.237648 | 0.0000 |
| localityMetropolitan | -2.059931 | 0.5757363 | -3.577907 | 0.0012 |
| localityNonmetropolitan | 8.045979 | 0.5757363 | 13.975110 | 0.0000 |
| **interpretation on linear | regression: | ** | ||
| The regression model is me | an percent_e | xcess_death | = 33.89 - 2. | 05*Metropolitan + 8.046Nonmetropolitan. For people in metropolitan, the expected percent excess death has a decrease of 2.06. For people in non-metropolitan aream the expected percent excess death has an increase of 8.046. The p-values show these predictors are both significant. The adjusted R squared is 0.91, which indicates the regression model explains 91% of variation in mean percent excess death due to the variation in locality. |
How should we help nonmetropolitan areas improve public health?
cod_datayr2 = read_csv("./data/NCHS_-_Potentially_Excess_Deaths_from_the_Five_Leading_Causes_of_Death.csv") %>%
clean_names() %>%
na.omit() %>%
filter(!(state == "United States")) %>%
separate(., percent_potentially_excess_deaths, into = c("percent_excess_death"), sep = "%") %>%
mutate(percent_excess_death = as.numeric(percent_excess_death), mortality = observed_deaths/population * 10000, mortality = as.numeric(mortality)) %>%
select(year, cause_of_death, state, benchmark, locality, observed_deaths, population, expected_deaths, potentially_excess_deaths, percent_excess_death, mortality)
## Warning: Too many values at 191748 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9,
## 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...
cod_datayr2 %>%
filter(benchmark == "2010 Fixed")%>%
group_by(cause_of_death,year, locality)%>%
mutate(year_percent_death=mean(percent_excess_death))%>%
distinct(year, cause_of_death, locality, .keep_all = TRUE)%>%
ungroup(year, locality)%>%
ggplot(aes(x = year, y = year_percent_death, color = cause_of_death))+
geom_point(size = 1)+
geom_line(size = 1) +
facet_grid(.~locality)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
scale_x_continuous( breaks=seq(2005,2015,1))+
labs(title = "Average yearly excess death percentage for different locality")
interpretation on linear regression: As can be seen in the plot, if we stratify by cause, we will find some increasing pattern in subgroups(unintentional injury) in certain period of time(2010-2015). In nonmetropolitan regions, unintentional injury and chronic lower respiratory disease have on average the highest excess death percentage in nonmetropolitan areas and needs more attention from public health experts.
cod_datayrlm = cod_datayr2 %>%
filter(locality == "Nonmetropolitan",
benchmark == "2010 Fixed")
lmpop = lm(data=cod_datayrlm, percent_excess_death~year+cause_of_death)
summary(lmpop)
##
## Call:
## lm(formula = percent_excess_death ~ year + cause_of_death, data = cod_datayrlm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -55.456 -9.828 0.576 10.404 39.489
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 769.06335 67.48938
## year -0.36941 0.03358
## cause_of_deathChronic Lower Respiratory Disease 27.06490 0.33901
## cause_of_deathHeart Disease 13.90257 0.32844
## cause_of_deathStroke 12.91441 0.33784
## cause_of_deathUnintentional Injury 31.18470 0.32812
## t value Pr(>|t|)
## (Intercept) 11.39 <2e-16 ***
## year -11.00 <2e-16 ***
## cause_of_deathChronic Lower Respiratory Disease 79.83 <2e-16 ***
## cause_of_deathHeart Disease 42.33 <2e-16 ***
## cause_of_deathStroke 38.23 <2e-16 ***
## cause_of_deathUnintentional Injury 95.04 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.92 on 19718 degrees of freedom
## Multiple R-squared: 0.365, Adjusted R-squared: 0.3649
## F-statistic: 2267 on 5 and 19718 DF, p-value: < 2.2e-16
rl5 = broom::tidy(lmpop)
kable(rl5)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 769.0633548 | 67.4893807 | 11.39532 | 0 |
| year | -0.3694128 | 0.0335766 | -11.00209 | 0 |
| cause_of_deathChronic Lower Respiratory Disease | 27.0649028 | 0.3390124 | 79.83454 | 0 |
| cause_of_deathHeart Disease | 13.9025703 | 0.3284411 | 42.32896 | 0 |
| cause_of_deathStroke | 12.9144132 | 0.3378397 | 38.22645 | 0 |
| cause_of_deathUnintentional Injury | 31.1847036 | 0.3281228 | 95.03972 | 0 |
From the linear model based on 2010 fixed benchmark, we can tell that controlling for cause of death, the excess death percentage is expected to decrease by 0.37 each year. Overall, the excess death percentage shows a decreasing pattern with time. Besides, in the same year, compared with cancer, other 4 causes have significantly higher excess death percentage. Herein unintentional injury and chronic lower respiratory disease have on average the highest excess death percentage.
For each of the five cause of death, what is the mean percent of excess death? It is different across locality. As shown in the boxplots for all three regions, the rank of the five cause of death remain the same for different localities. For all five causes, the mean percent of death is lower for Metropolitan and higher for Nonmetropolotan.
cod_data %>%
=======
<<<<<<< HEAD
For each of the five cause of death, what is the mean percent of excess death? It is different across locality. yz3306: As shown in the boxplots for all three regions, the rank of the five cause of death remain the same for different localities. For all five causes, the mean percent of death is lower for Metropolitan and higher for Nonmetropolotan, with respect to All Locality.
cod_data %>%
=======
<<<<<<< HEAD
For each of the five cause of death, what is the mean percent of excess death? It is different across locality. As shown in the boxplots for all three regions, the rank of the five cause of death remain the same for different localities. For all five causes, the mean percent of death is lower for Metropolitan and higher for Nonmetropolotan.
=======
For each of the five cause of death, what is the mean percent of excess death? It is different across locality.
<<<<<<< HEAD As shown in the boxplots for all three regions, the rank of the five cause of death remain the same for different localities. For all five causes, the mean percent of death is lower for Metropolitan and higher for Nonmetropolotan.
>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5
cod_data %>%
>>>>>>> 759804787aa318daf2c8131d2fc7dc781ad1c015
>>>>>>> e4585b6277d751d1cb48f325ee19042bf80319a3
select(state, locality, percent_excess_death, hhs_region, cause_of_death) %>%
filter(locality != "All") %>%
mutate(hhs_region = as.factor(hhs_region)) %>%
group_by(cause_of_death, locality) %>%
summarise(mean_ped = mean(percent_excess_death)) %>%
ungroup() %>%
ggplot(aes(x = cause_of_death, y = mean_ped, fill = cause_of_death)) +
geom_col() +
facet_grid(~locality) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
<<<<<<< HEAD
=======
<<<<<<< HEAD
interpretation on linear regression: yz3306:Regress percent of excess death on cause of death, which is a categorical variable. with respect to cancer, the least of the five causes, all 4 have additional percent of death as evaluated by the estimates. The results give statisically significant estimates .
=======
<<<<<<< HEAD
>>>>>>> e4585b6277d751d1cb48f325ee19042bf80319a3

For the above graph, if we seperate the effect of the gap between urban and rural area into different HHS region(Figure “above”) as mentioned, the following chart gives detailed information.
cod_data %>%
select(state, locality, percent_excess_death, hhs_region, cause_of_death) %>%
filter(locality != "All") %>%
mutate(hhs_region = as.factor(hhs_region)) %>%
group_by(cause_of_death, locality,hhs_region) %>%
summarise(mean_ped = mean(percent_excess_death)) %>%
ggplot(aes(x = hhs_region, y = mean_ped, fill = hhs_region)) +
geom_col() +
facet_grid(~cause_of_death) +
theme(axis.text.x = element_text(angle = 15)) +
labs(
y = "Mean %Excess Death",
x = "HHS Regions",
fill = "HHS Region"
)

In order to take the effect of Localtiy into account, Flexdashboard with Shiny will be helpful. the link to the Shiny page is Here - “Comparison of Difference in Mean %Excess Death of Five Leading Causes of Death in each HHS regions between Metropolitan and Nonmetropolitan Area”.
Regress percent of excess death on cause of death, which is a categorical variable. with respect to cancer, the least of the five causes, all 4 have additional percent of death as evaluated by the estimates. The results give statisically significant estimates .
interpretation on linear regression: yz3306:Regress percent of excess death on cause of death, which is a categorical variable. with respect to cancer, the least of the five causes, all 4 have additional percent of death as evaluated by the estimates. The results give statisically significant estimates.
cod_lm_yz1 = lm(percent_excess_death ~cause_of_death, data = cod_data)
rl6 = broom::tidy(cod_lm_yz1)
kable(rl6)
term
estimate
std.error
statistic
p.value
(Intercept)
21.60686
0.0839908
257.25277
0
cause_of_deathChronic Lower Respiratory Disease
21.08880
0.1209180
174.40579
0
cause_of_deathHeart Disease
10.89773
0.1187990
91.73246
0
cause_of_deathStroke
12.92196
0.1204678
107.26483
0
cause_of_deathUnintentional Injury
25.77100
0.1187583
217.00389
0
interpretation on linear regression: yz3306:Regress percent of excess death on cause of death, which is a categorical variable. with respect to cancer, the least of the five causes, all 4 have additional percent of death as evaluated by the estimates. The results give statisically significant estimates . >>>>>>> 2e142df512c521796baa4d407ffc2f6d3065ae49
>>>>>>> 759804787aa318daf2c8131d2fc7dc781ad1c015cod_lm_yz1 = lm(percent_excess_death ~cause_of_death, data = cod_data)
broom::tidy(cod_lm_yz1)
## term estimate std.error
## 1 (Intercept) 21.60686 0.08399078
## 2 cause_of_deathChronic Lower Respiratory Disease 21.08880 0.12091802
## 3 cause_of_deathHeart Disease 10.89773 0.11879905
## 4 cause_of_deathStroke 12.92196 0.12046778
## 5 cause_of_deathUnintentional Injury 25.77100 0.11875825
## statistic p.value
## 1 257.25277 0
## 2 174.40579 0
## 3 91.73246 0
## 4 107.26483 0
## 5 217.00389 0
<<<<<<< HEAD
map_cod_data = cod_data %>%
filter(locality == "Metropolitan") %>%
select(state, locality, percent_excess_death) %>%
group_by(state) %>%
summarise(mean_ped = mean(percent_excess_death)) %>%
dplyr::filter(!(state == "District of\nColumbia"))
map = as.tibble(fifty_states) %>%
group_by(id) %>%
summarize(clong = mean(long), clat = mean(lat)) %>%
filter(!(id == "district of columbia"))
df <- cbind(map, state.abb, state.center, rate = unique(map_cod_data$mean_ped))
ggplot(df, aes(map_id = id)) +
geom_map(aes(fill = rate), map = fifty_states) +
expand_limits(x = fifty_states$long, y = fifty_states$lat) +
labs(x = "", y = "") +
theme(panel.background = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank()) +
geom_text(aes(x = clong, y = clat, label = state.abb)) +
scale_fill_gradient(low="gold", high="red")

Here we have generated a map of the United States bases with color gradient showing the mean percent of excess death rate regardless of the locality. Here is a Shiny link for a closer look of the national distribution of mean percent excess death varied by locality – metropolitan and nonmetropolitan areas. We observe for both metropolitan and nonmetropolitan areas, the Southeast region appears to have significantly high mean percent of excess death rate than the others, but the rural-urban gap is not apparent. This can be explained by the geographic features of the Southeast region.
<<<<<<< HEAD
If we compare this map with the poverty map of the US, we see that states with higher mean percent excess death is more likely to be in the poverty area.
If we compare this map with the poverty map of the US, we see that states with higher mean percent excess death is more likely to be in the poverty area.
<<<<<<< HEAD ======= =======
If we compare this map with the poverty map of the US, we see that states with higher mean percent excess death is more likely to be in the poverty area.